Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

Algorithms for Binary Neural Networks

TABLE 3.4

With diﬀerent λ and θ, we evaluate the accuracies of BONNs

based on WRN-22 and WRN-40 on CIFAR-10/100. When

varying λ, the Bayesian feature loss is not used (θ = 0).

However, when varying θ, we choose the optimal loss weight

(λ = 1e −4) for the Bayesian kernel loss.

Hyper-param.

WRN-22 (BONN)

WRN-40 (BONN)

CIFAR-10

CIFAR-100

CIFAR-10

CIFAR-100

1e −3

85.82

59.32

85.79

58.84

1e −4

86.23

59.77

87.12

60.32

1e −5

85.74

57.73

86.22

59.93

84.97

55.38

84.61

56.03

1e −2

87.34

60.31

87.23

60.83

1e −3

86.49

60.37

87.18

61.25

1e −4

86.27

60.91

87.41

61.03

86.23

59.77

87.12

60.32

3.7.7

Ablation Study

Hyper-Parameter Selection In this section, we evaluate the eﬀects of hyperparameters

on BONN performance, including λ and θ. The Bayesian kernel loss and the Bayesian

feature loss are balanced by λ and θ, respectively, to adjust the distributions of kernels and

features in a better form. WRN-22 and WRN-40 are used. The implementation details are

given below.

As shown in Table 3.4, we ﬁrst vary λ and set θ to zero to validate the inﬂuence of

Bayesian kernel loss on kernel distribution. The utilization of Bayesian kernel loss eﬀectively

improves the accuracy on CIFAR-10. However, the accuracy does not increase with λ,

indicating we need not a larger λ but a proper λ to reasonably balance the relationship

between the cross-entropy and the Bayesian kernel loss. For example, when λ is set to

1e −4, we obtain an optimal balance and the best classiﬁcation accuracy.

The hyperparameter θ dominates the intraclass variations of the features, and the eﬀect

of the Bayesian feature loss on the features is also investigated by changing θ. The results

illustrate that the classiﬁcation accuracy varies similarly to λ, verifying that Bayesian feature

loss can lead to a better classiﬁcation accuracy when a proper θ is chosen.

We also evaluate the convergence performance of our method over its comparative coun-

terparts in terms of ResNet-18 on ImageNet ILSVRC12. As plotted in Fig. 3.22, the XNOR-

Net training curve oscillates vigorously, which is suspected to be triggered by a suboptimal

learning process. On the contrary, our BONN achieves better training and test accuracy.

Eﬀectiveness of Bayesian Binarization on ImageNet ILSVRC12 We experimented

by examining how each loss aﬀects performance better to understand Bayesian losses on

the large-scale ImageNet ILSVRC12 dataset. Based on the experiments described earlier, if

used, we set λ to 1e −4 and θ to 1e −3. As shown in Table 3.5, both the Bayesian kernel

loss and Bayesian feature loss can independently improve the accuracy on ImageNet. When

applied together, the Top-1 accuracy reaches the highest value of 59.3%. As shown in Fig.

3.21, we visualize the feature maps across the ResNet-18 model on the ImageNet dataset.

They indicate that our method can extract essential features for accurate classiﬁcation.

TABLE 3.5

Eﬀect of Bayesian losses on the ImageNet data

set. The backbone is ResNet-18.

Bayesian kernel loss

Bayesian feature loss

Accuracy

Top-1

56.3

58.3

58.4

59.3

Top-5

79.8

80.8

81.6